Maximizing Data Efficiency of HTR Models by Synthetic Text


Authors

Marco Peer (TU Wien)*; Florian Kleber (TU Wien); Robert Sablatnig (TU Wien); Markus Muth (TU Wien)
mpeer@cvl.tuwien.ac.at*; kleber@cvl.tuwien.ac.at; sab@cvl.tuwien.ac.at; mmuth73@gmail.com

Abstract

The usability of synthetic handwritten Text to improve machine learning models is assessed for the domain of HTR. Synthetic handwritten text is generated using an existing model based on a GAN. The output of this model is then used to train a state-of-the-art HTR model, which is then applied to recognize real datasets. While this results in a CER of 28.3 % and a WER of 65.5 % for line images of the IAM dataset - more than three times higher than the state-of-the-art result - our experiments show that the amount of real data in a mixed training set can be significantly reduced (70-80 %) to achieve comparable CER and WER rates as with real data. Using only 10 % of the training data (113 images) from the CVL dataset results in a CER of 54.5 % and a WER of 88.8 %, pre-training the model with synthetic data results in a CER of 14.6 % and a WER of 43.4 %.